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Executive Summary 



School systems across the nation have adopted policies that reward or sanction particular schools on the basis of their 
students' performance on standardized math and reading tests. One of the most frequently raised concerns regarding 
such "high-stakes testing" policies is that they oblige schools to focus on subjects for which they are held accountable 
but to neglect the rest. Many have worried that the limited focus of these policies could have an unintended negative 
effect on student proficiency in other subjects, such as science, that are important to the development of human 
capital and thus to future economic growth. 

This paper uses a regression discontinuity design utilizing student-level data to evaluate the impact of sanctions under 
Florida's high-stakes testing policy on student proficiency in science. Under that state's A+ program, every public school 
receives a letter grade from A to F that is based primarily upon its students' performance on the state's standardized 
math and reading exams. Students in Florida were also administered a standardized exam in science, but this test was 
low-stakes because its results held no consequences under the A+ program or any other formal accountability policy. 

Previous research has found that the rewards and sanctions of receiving an F grade in the prior year led to improved 
gains in student proficiency in the high-stakes subjects of math and reading. This current paper is the first to evaluate 
the impact of the incentives under this high-stakes testing system on student proficiency in science. This paper adds 
to a sparse previous literature quantitatively evaluating whether high-stakes testing policies have "crowded out" 
learning in a low-stakes subject. 

The primary findings of the study are: 

• The F-grade sanction produced after one year a gain in student science proficiency of about a 0.08 standard 
deviation. These gains are similar to those in reading and appear smaller than the gains in math that were due 
to the F sanction. 

• There is some evidence to suggest that student science proficiency increased primarily because student learning 
in math and reading enabled that increase. That is, learning in math and reading appear to contribute to learning 
in science. 
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Building on the Basics: 

The Impact of 
High-Stakes Testing on 
Student Proficiency in 
Low-Stakes Subjects 

Marcus A. Winters, lay P. Greene INTRODUCTION 
& Julie R. Trivitt 

S chool systems across the nation have adopted policies that 
reward or sanction particular schools on the basis of their 
students’ performance on standardized tests. Such testing 
has been a dominant force in education policy since at 
least the 1990s. More than half the states had already implemented 
some form of high-stakes test before the No Child Left Behind Act 
(NCLB) made it universal in 2002. We call a test a high-stakes test 
when there are meaningful consequences for schools or students 
that are based on how students perform on the test. 

One of the most frequently raised concerns regarding high-stakes 
testing policies is that they oblige schools to focus on subjects for 
which they are held accountable but to neglect the rest (Nichols 
and Berliner 2007; Gunzenhauser 2003; Groves 2002; Patterson 2002; 
Murillo and Flores 2002; McNeil 2000; Jones et al. 1999). The vast 
majority of these policies base their rewards or sanctions exclusively 
on the results of reading and math tests. Though some policies are 
more expansive than others, few threaten meaningful consequences 
when students fail to meet standards in subjects such as science, his- 
tory, or the arts. Failure to assure student mastery of subjects other 
than basic math and reading could have important implications for 
the future of human capital in the United States. 

If schools reallocate time and resources away from important but 
low-stakes subjects and toward the high-stakes subjects, with the 
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result that students achieved in the high-stakes sub- 
jects at the expense of proficiency in the low-stakes 
subjects, we would say that the policy “crowded out” 
learning in the low-stakes subjects. It is important to 
note that this definition of crowding out focuses on 
learning output, not teaching inputs. In other words, 
if schools increased time spent on math or reading by 
decreasing time spent on science, we would consider 
high-stakes testing of math or reading to have crowded 
out science teaching only if students actually learned 
less science as a result. 

A substantial amount of anecdotal and qualitative 
evidence suggests that schools and teachers have 
responded to high-stakes testing by adjusting their 
teaching styles (McNeil 2000; New York State Educa- 
tion Department 2004) and by shifting focus away 
from low-stakes subjects (Center on Education Policy 
2006; Jones et al. 1999; King and Mathers 1997; Gordon 
2002; Groves 2002; Murillo and Flores 2002). However, 
there is currently very little empirical evidence of the 
impact of high-stakes testing policies on measured 
student proficiency in subjects that are not part of the 
accountability system. 

In the only quantitative evaluation of this topic of 
which we are aware, Jacob (2005) found that Chicago’s 
high-stakes testing system led to significant learning 
gains in the low-stakes subjects of science and social 
studies. However, he found that these gains were 
smaller than those in the high-stakes subjects of math 
and reading. 

In this paper, we add to the limited previous research 
by evaluating the effects on student proficiency in 
the low-stakes subject of science and the high-stakes 
subjects of math and reading of a high-stakes testing 
system in Florida that employs sanctions. There are two 
important reasons to research this question in a system 
other than Chicago’s. First, by evaluating the impact 
of sanctions under high-stakes testing on student 
proficiency in low-stakes subjects in another school 
system, we can help determine whether the results in 
Chicago are limited to that area or hold more gener- 
ally. Second, it is important to investigate outcomes 
in another system, since some research has found 
systematic manipulations of Chicago’s high-stakes 



exams that could skew results (Jacob 2005; Jacob and 
Levitt 2003). Previous research in Florida found that the 
results of that state’s high-stakes exams have not been 
systematically manipulated and are generally reliable 
indicators of student proficiency (Greene, Winters, and 
Forster 2004; West and Peterson 2006). 

Florida’s high-stakes testing program is also worth 
studying because its accountability system, unlike 
that of many other accountability systems, makes it 
possible to use a rigorous “regression discontinuity” 
design, which allows for a causal test of the impact 
of the program’s sanctions. Beginning in the 2001-02 
school year, schools received letter grades reflecting 
points earned under an elaborate system for captur- 
ing several aspects of a school’s performance. As 
described below, Florida imposes meaningful sanc- 
tions only when a school receives a failing grade. We 
follow the strategy of a previous paper by Rouse et 
al. (2007) that uses the change in the policy to control 
for the heterogeneity of schools that receive a failing 
or passing grade. 

We find that students attending schools designated 
as failing in the prior year made greater gains on the 
state’s science exam than they would have clone if their 
school had not received the F sanction. The gains that 
students made in science were similar to those that 
previous research (which we replicate here) has found 
that students made in the high-stakes subjects of math 
and reading. These findings suggest that the incentives 
of Florida’s high-stakes testing program have not led 
to significant crowding out of student knowledge in 
the low-stakes subject of science. 

At first, our results may seem counterintuitive, in that 
high-stakes testing in only certain subjects would be 
expected to lead schools to focus on those areas. In 
fact, encouraging schools to shift their priorities to- 
ward subjects commonly recognized as academically 
important (i.e. , math and reading) is arguably one of 
the purposes of the policy. 

There are two reasons that high-stakes testing might 
instead have a positive effect on student achievement 
in low-stakes subjects. First, the pressure of account- 
ability testing could lead schools to adopt reforms that 
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improve their overall quality. For example, a school 
could more effectively motivate its students, or it could 
improve relations with its teachers. Though schools’ 
purpose may be to improve student scores in math 
and reading to avoid the sanctions of the high-stakes 
testing policy, general improvements of this kind 
might produce across-the-board increases in student 
achievement. Second, sanctions under high-stakes test- 
ing could improve student achievement in low-stakes 
subjects if the resulting mastery of high-stakes subjects 
facilitates mastery of other subjects. 

Though a true test of the prevalence of either of these 
kinds of explanations is not available to us, we have 
discovered evidence suggesting that student profi- 
ciency in science has increased under the high-stakes 
sanctions primarily because the improvements that 
students have made in math and reading have en- 
hanced their ability to learn science material as well. 
Flowever, we stress that future research using stronger 
strategies than are available here to explain a positive 
relationship between high-stakes testing and student 
improvement in low-stakes subjects is necessary. 

FLORIDA’S A+ ACCOUNTABILITY 
PROGRAM 

F lorida is among the nation’s leaders in high- 
stakes testing. Most agree that the state’s A+ 
Accountability Program (A+) is one of the most 
aggressive programs of its kind. It was clearly a tem- 
plate for the federal NCLB law. 

Each year, the state administers a standardized test, 
the Florida Comprehensive Assessment Test (FCAT), 
in math and reading to all public school students in 
the state who are enrolled in grades 3-10. Schools 
receive letter grades, from A to F, based on the per- 
centage of their students meeting particular achieve- 
ment levels and the academic progress of students 
in certain subgroups. 

There are two important reasons that we might expect 
schools deemed to be failing to respond positively. 
Those that have received an F grade for the first time 
may be shamed into improving their performance (Fi- 



glio and Rouse 2005; Ladd 2001; Carnoy 2001; Harris 
2001). Those that have received at least one failing 
grade may decide to raise their performance because 
they fear attrition of their student body. This may occur 
as the result of a policy of issuing Opportunity Schol- 
arships (vouchers) to students in schools that have 
received two failing grades within a four-year period 
that they can use to attend another public school or a 
private school willing to accept the voucher as a full 
tuition payment. 1 In this paper, we are not particularly 
concerned with whether these or any other phenom- 
ena drive increases in student performance in either 
high- or low-stakes subjects. 

A change in the administration of the program provided 
an interesting avenue for researching Florida’s policy. 
In the program’s initial years, school grades were based 
on the percentage of students earning level 2 (the 
second-lowest of five levels) or above on the read- 
ing, math, and writing portions of the FCAT and the 
percentage of eligible students tested. A school could 
avoid earning an F if at least 50% of tested students 
scored at achievement level 3 in writing, or if 60% of 
tested students scored at level 2 in reading or math 
and 90% of eligible students were tested. If a school 
met one or two of these criteria, it earned a D. If it met 
all three of these criteria, it earned a C. Schools with 
particular subpopulations meeting all three received a 
B. To earn an A, schools had to meet more stringent 
requirements for the overall student population and 
each subpopulation. The opinion was widespread 
that schools had determined that satisfactory scores in 
writing were the easiest to achieve under the original 
school-grading format and that the teaching of writ- 
ing in struggling schools therefore stressed techniques 
geared to the writing portion of the exam. 

Starting in the 2001-02 school year, Florida adopted 
an accumulating point system to evaluate schools. 
Schools earn one point for each percent of students 
who score in achievement levels 3, 4, or 5 (the three 
highest of five levels) in reading and one point for 
each percent of students who score in levels 3, 4, or 
5 in math. Schools earn one point for each percent 
of students scoring 3-5 or above in writing, which is 
graded from 1 to 6. Schools earn one point for each 
percent of students who make learning gains in read- 
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ing and one point for each percent of students who 
reach a higher achievement level or maintain a 3, 4, 
or 5 in math. Schools also earn one point for each 
percent of the lowest-performing readers who make 
test-score improvements in the year in question. A 
school that earns fewer than 280 points receives a 
failing grade. The multifarious nature of the grading 
process has probably made direct manipulation of the 
system relatively difficult. 

Beginning in the 2002-03 school year, Florida public 
schools also were required to test for proficiency in 
science when they administered the math and read- 
ing exams. The science part of the FCAT is currently 
administered to all public school students in grades 5, 
8, and 11. The results of the science exam have now 
been incorporated directly into the accountability pro- 
gram; but during the years of our analysis, they had 
no effect on the school’s grade, nor did they represent 
any other form of official accountability. 

Several researchers have evaluated the impact of the 
A+ program on the academic gains of public school 
students in math and reading (Rouse et al. 2007; 
Greene and Winters 2004; Chakrabarti 2005; Figlio and 
Rouse 2005; West and Peterson 2006; Greene 2001). 
Though there is some disagreement about which 
aspect of the accountability policy was effective (the 
threat of vouchers or the shame of an F grade), each of 
these analyses found that the policy improved the math 
and reading proficiency of students in public schools 
designated as failing. We are aware of no previous 
research analyzing the impact of the A+ program on 
science test scores. 

DATA AND METHOD 2 

W e utilize a data set provided by the Florida 
Department of Education that contains 
test scores in math, reading, and science 
as well as demographic characteristics of the uni- 
verse of students enrolled in grades 3-10 in Florida 
public schools. We supplement the individual-level 
data set with school-level information — specifically, 
the school’s point total and letter grade under A+ at 



the end of the 2001-02 school year. To simplify the 
comparison of scores in different subjects, we convert 
the FCAT scores of students who were in our sample 
into a scale score with a mean of 0 and standard 
deviation of 1. 

In order to align our findings with those in the pre- 
vious literature, we utilize the comparison strategy 
implemented in a 2007 study conducted by Rouse et 
al. that evaluated the impact of Florida’s A+ policy on 
student achievement in math and reading. Our sample 
consists of the universe of Florida public school stu- 
dents who were enrolled in the fifth grade in 2002-03 
and were promoted at the end of the prior year. This 
was the first class of fifth-grade students attending 
a school that had received a letter grade under the 
revised point system of the A+ policy. We focus on 
only those students with both a math and reading test 
score reported in 2001-02 and 2002-03- 

We supplement the individual-level data with admin- 
istrative information on the school’s grade and points 
earned under the A+ system during the summer of 
2002. In the analyses that follow, along with observable 
characteristics of the student and school we control for 
both the school’s letter grade at the end of the 2001-02 
year and the total points earned under the grading 
system. The idea here is that controlling for the points 
earned by the school accounts for differences in school 
performance, and thus the remaining differences in 
the performance of students at schools receiving an 
F grade must reflect responses to the incentives that 
exist under the accountability policy. 

We use this general comparison strategy to perform 
a series of cross-sectional regressions. We are first 
concerned with discovering whether students made 
academic gains in science due to the F sanction, and 
we also confirm the finding of an impact of the sanc- 
tion on student proficiency in math and reading. We 
then evaluate the extent to which any gains made by 
students in science due to the F sanction were driven 
by improvements in the overall performance of the 
school or a symbiotic relationship between learning 
in the high- and low-stakes subjects. 
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